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1. Title of Inventioa 

A method for the estimation of a three-dimensional representation 

2« Claios 

1. In a process for the three dimensicnal lepiesenjtation of a Goenfi from a 
. plurality of two-dimeiisional images of the scene that depends on knowing 
the amount of rotatioia involved in the viewpoints represented by a pair of 
cBfferent images ol the scene, the method for apinoodmating the amoimt of 
rotation involved conqurimog the steps of: 

* ; --^ (a) "dete rmiiiirig a ot ob iie ^ ibn ^^ epipaiar lines in a pair 



of images of the scene asgniming a specific amount of rotation between 



the two viewpoints of the pair of images; 



(b) preparing a histogram of the pixel intensities along each of the epipolar 



Ux&eSj 



(c) determining the simi of the squared drSBcences of tiie pixel intensiiy 
levels of the histograms of each pair of coiresponding epipolar lines of 
the two images; 



(d) detemimng the total of such sums; 



(e) repeating steps a, b, c and d for difiierent amounts of assumed rotation; 
and 



(f) using the amount of assumed rotation tiiatb associated with the sm^ 
est total determined in step d. 



2. The m^od of daim 1 in which the pIuraHty of pairs of epipolar lines in 
step a is at least fifty; 



3. The method of daim 1 in which step a uses a gradioit <^p«p-gnt search in the 
choice of the amount of the assumed rotation. 



4. The method of claim 1 in whidi histogram normalization is first used to 
compensate for vaxiatkzus in image brightness. 
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5. In a proce^ for the three dimensional representatioa of a scene from a 
plurality of two-dimensionsJ, of the scene that depends on knowing 
the amount of translation involved in the viewpoints represented by a pair 
of different images of the scene, the tnethod for approxLmatmg the amount 
of translation involved comprising the steps of: 

(a) determining a pliurality of corresponding pairs of epipolar lines in a 
pair of images of the scene assuming a specific amount of translation 
between the tyero viewpoints of the pair of images; 

(b) preparing a histogram of the pixel intenatics along each of the epipolar 
lines; 

(c) determining the sum of the squared differences of the pixel intensity 
levels of the histogranis of each pair of corresponding epipolar lines of 
the two images; 

(d) determining the total of such sums; 

(e) repeating steps a, b, c and d for different amounts of assumed translan 
tion; and 

(f ) using the amount of translation assumed that is associated with the 
smallest total determined in step d, 

6. The method of claim 5 in which the plurality of pairs of epipolar lines in 
step a is at least fifty. 

7. The method of claim 5 in which step a \jses a gradient descent search in the 
choice of the amount of the assumed translation. 

8. The method of daim 6 in which histogram normalization is first used to 
compensate for variations in image brightness. 

9. In a process for determimng the egomotion of the viewpoint of a camera in 
two firames of an image, the process of daim 1 for determining the rotational 
component of the egomotion and the process of daim 5 for determining the 
translational component of the egomotion. 




(11) 1^^^8-3 2 09 3 3 

3. Detailed Description of Invention 

Field of the Invention 

This invention relates to computer vision and more particularly to \is€ of a com- 
puter to develop a three-dimenBional representation of a scene from two-dimensional 
representations of the scenes and other uses that depend on knowledge of changes 
in orientation of different views of an object. 

Background of the Invention 

ijLki^:iiji^ii^£tti*i^isix^ a thr^-dimenskjnal r^p^''^^^". '^x'■^f^^^^:ly^:^'^,^■^-.^,^..^^ 

tation of a scene or object firom two-dimensional images of the scene or object, 
important parameters axe the changes in viewpoints of the different views of the 
scene. When two images of the scene represent two views that involve unknown 
rotation and translation of the camera recording the scene, to be termed ego- 
motion, such as might result from noise, considerable computation is involved in 
making a faithful three-dimensional reconstruction. A faithfiil three-dimensionai 
• reconstruction has utility in many applicatioiis, such as estimation of travel in nav- 
igation^ three-dimen^onal representation of an object from two two-dimensional 
representations and video mosaidng, the integration of many views of different 
parts of a scene into a single view of the total scene, such as is described in an 
article by It Kumar et al entitled, **Shape recovery from multiple views: a paral- 
lax based approach " in the Proc, of ARAP Image Understanding Workshop, 1994. 

The problem of estimating the ego-motion and structural form from two image 
frames of a scene has long been studied in computer vision. There hare been 

primarily two distinct classes of structure-and- motion algorithms that have been 
tried. The first is feature-based and assumes that there is a known number of 
feature-correspondence between the two frames. While few correspondences ^e 
needed in theory to solve the structure-and-motion problem, this approach is very 
sensitive to noise and many correspondences are in fact needed to stabilize the 
solution. Moreover, it is often the case that no feature-correspondences are known 
a priori and finding these can be laborious. 
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The second approach involves a dass of direct methods of motion-and-structure 
estimatiiig in which explicit featuie-corrcspondeaices are not required- 
Solutions umjxg this approach can be bEroadly categorized into two main subclasses. 
One subclass approach to the problem is first to develop knowledge of the optical 
flow field of the frames involved. The second subclass approach has been to exploit 
the brightness-change constraint equation directly to develop solutions for motion 
and structure, as is described in an article by B.K.P. Home and E.J. Weldon, Jr. 
entitled, ^Direct Methods for Recovering Motion," in Int. J. of Computer Vision, 
voL 2, 1988, pages 51-76.. 



The present invention involves a direct method for estimating the rotational ego- 
motion between a pair of two-dimensional images or camera frames of a scene that 
is based on a search through the three-dimensional rotational space that is associ- 
ated with the scene. This is posdble if, and only IE, there exists image properties 
such that eadi hypothesized ego-motion can be evalilated relative to one another 
80 that a particular egp-motion can be identified as the most appropriate one for 
use in the three-dimensional representation. 

A feature of the invention is the novel use of the properties of intensity histograms 
computed along epipolar lines that can be supposed to be corresponding. Tbese 
useful properties first depend on the assumption of constant image brightness 
so that one can assume that the histograms of corresponding epipolar lines are 

invariant (ignoring occlusions} and that the histograms of almost corresponding 
epipolar lines are similar, this similarity being a function of the spatial correlation 
preseut in the image. There are available techniques such as histogram norrnal- 
ization that can be used to compensate for variations in image brightness and 
thereby satisfy the assimnption. 
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The property that the difference between two histograms of two epipolar lines 
is a minimnTTi when the two epipolar lines truly correspond and increases mono- 
tonically with the d^ree of xnisalignxnent between two epipolar lin^ allows the 
rotational motion between the two to be estimated in a strai^iforward manner 
as a fcHTftft- ^l^AfiRirm a1 epipolar search. 

Accordingly, the amount of rotation between two camera frames of the same 
scene taken firom two viewpoints that are spaced apart can be effectively esti- 
mated as follows. First, there is assumed that a certain amount of pure rotation 
was involved in the difference in viewpoints and based on such assumption there 
are derived epipolar lines for the two frames by known methods. For each frame, 
histograms of the pixel intensities along a number of corresponding epipolar lines 

histograms of corresponding epipolar lines from the two frames for each of the 
diosen number of epipolar lines of the two frames and this serves as a figure of 
merit for the particular assumption of the amount of the rotation. This process 
is repeated with different assumed amounts rotation and a suitable search, for 
example gradient descent or pyramidal, is carried out to find the assumed rotation 
that gives the lowest value of the figure of merit. The amount of rotation of enxck 
assumption is then (created as the actual amount of the rotation in the further 
processing of the frames to derive three-dimensional representations of the scene 
involved or other uses. In instances where the separation or translation of the two 
viewpoints may be significant, it noay be desrable to approximate the amount of 
su<^ separation or translation by repeating above the procedure or other suitable 
procedure using instead assumptions as to the separation eiUier after or before 
the above procedure for detemmuBg the amount of rotation. In some instances, 
it may be preferable first to estimate the translation and thereafter to estimate 
the rotation of the ego-motion. 

The invention will be better understood from the following more detailed de- 
scription taken with the accompanying drawing. 
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Detailed Description of the lavention 



Before discussing in detail the practice of the invention, it will be helpful to provide 
some background in epipolar geometry with the aid of FIG. 1. To this end we 
be^n with a brief review of some dmple mathematics to describe the epipolar 
relationship between two slightly difl^ent views of a scene. With perspective 
projection, a projected point jF^ = (ar^ i/i Ij^ in projection plane 12 of camera A 
(not shown) can be the projection of a line 14 of three-dimensional pointis -P«{zo) 
of different depth Za^ We then have 



PaiZo) = 



where / is the focal length. Projecting tiiose points to the projection plane 16 of 
camera B (not shown) gives a set of collinear points PUza) = [ar^yjl]^ that will 
form the epipolar line 18. 



We also have 



fiZaB-¥t2A) 
Zjj + t34 



where 3^ represents the coordinate transformation between the two cameras and 



' A ' 




' tn 


tl2 


tl3 " 




\</f] 


B 






^22 


*23 




y'Jf 


C 




.^31 


*32 


*33 . 




1 



and tij is element (i, j) of matrix T^. The projection matrix J is defined as 



j = 



f 0 

0 / 
0 0 



0 
0 

1 



0 
0 
0 



and P* is the projective coordinate representation of a point P. K — [ti v ti/]*^, 
then the homogeneous cudidean coordinate P* is [u/w v/w 1]^- 
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Th« displacement of the image point i^can be deoomposed lata two eomponeats. 
The SiA oamponeot is the zot^onal part of the displacement and is defined as 

'wWl* the^Kcoad txanponeat is the'epi|ialax vectorj or traoifilatioxial part of tiie 
displaoeiiient, and is as 

vJierea^ is the minim u m d^h expected for Pa .Thoee components caa be used 
to dedve the simple relation 



+Ji3>, 0<e<l 



(3) 



^^?f?l.?,S4^?j^^^^^>^^???.t^^^ (1) and (3) isidicait^ 

that the rotational displacement is md^yendent of distance while the transtaUonal 
<liqedaoement ehifts points alon^ the eprpolar line by amounts that a^ inversely 
proporUonal to distance, as illustrated In FIG. 2. 

A more detailed discussion of epipolar eeoxnetcy jb provided by a paper entitled 

''E^polar-Plane Ima^ AnalysiB: A l!bGiuuque for AneJiysang Motion Sequenced' 
by SLC^ BoUes and H.E£. Baker that appeaved In PROC. lEEB 3rd Workshop 
on Computer Vision Representation and Control^ pp. I6S-i78, (1985) and sudi 
paper is incorporated herein hy reference. With this baidcground, we can lay a 
tbeoretiGal basis for the invention. 



Consistent with the earlier mentioned fxst property of histograms. If vre assume 
(1) that the constant bzighteess constoaint applies, i.e. the brightness of an im- 
point is nnrhanged by the motion of the camera, and (2) that tlie number 
of occtuaona is small, then it is dearly the case that tbe histogcams of the intenr 
eitiea of two c or responding epipolar lines are identical since the two lines contain 
essentialiir identical pixel intenraties, only their positioa m^y be ghar^ggrf because 
of dciith. 

Now WE consider the case in which the camera motion contains a small change, 
Gther in its rotational or translatlonal component, as represented in FIG* 2. As 
a GDnseqnence, the *^eplpolar^ lines of Rcpiatinn (3) above will be erxoneons, but 
dose to the true eiHpolar lines. 
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This now prepares us for use of the second property of histograms mentioned ear^ 
Uer, Assuming (l) that the constant brightness constraint appUes, and (2) that 
the number of occlusions is smaU, then the intensity histograms of two Veudo^ 
bipolar" lines that are spatiaU/ close to a pair of truly corresponding epipolar 
lines haye amilar (in a sum of squared errors sense) histograms. The difference 
between two pseudo-epipolar histograms is a minimum when the lines carrespond 
to the true epipolar geometry and increases approximately monotonically with 
the size of the rotational error. 

That this property appUes generally to natural images can be deduced as fol- 
lows. It is weU known that image intensities are spatially highly correlated- As 
depicted in FIG. 3, small eixore in the camera displacement cause a point 
in image to be projected to a point which is spatially close to the true epipolar 
lineJSp^, The smaller the ^tor, the closer this point is to Bp^, Local image 
cSaa^Sfee'tHenTfi^^ ^ue of erf<firt6Us c^ ^ . 

close to the true intensity value that lies somewhere on the true epipolar line. 

While it is easy to construct artificial images for which the second property does 
not hold, these images are never natural. For example, an image of a rotation- 
ally invariant circle would not allow the z component of rotation to be estimated. 
However, in general, we believe this property to hold for a large class of images- 

By comparing the effects of translational error and rotational error, (FIG. 3 A 
and PIG, 3B, respectively), it can be shown that taranslational error usually cre- 
ates less displacement from the true epipolar line than rotational error. This is due 
to the fact that the displacement magnitude from translational error is *1Laversely 
scaled** by the minimum depth of the objects in the scene, while the displacement 
from rotational error is not (see Equations (1) and (2)). 

This implies that if the objects are not too close, the rotational error always 
has a much bigger impact than translational error. In the limit case where aU 
objects are in the background (at infinity), the translation error does not create 
any displacement at all. 

One can deri^ an important conclusion from this relation. The translational 
error generaUy creates a -negUgible- amount of di^dacement from the true e,»po- 
lar hne. "I^^ one can assume in the usual case that rotational error causes aU 
pomt Asplacemenl. There will be discussed later a suitable approach fcr the ^ 
usual case. 
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With this theoretical basis as a background, we can now proceed to a descripK 
tion of the process of the invention, 

FIG. 4 is a flow chart of the process for deternuning the unknown amount of 
rotation between two frames of a scene taken dther by two cameras that are 
spaced apart or one camera that has been moved to T&xad the two frames. For 
this process, constant image brightness, which is the more typical case, is being 
assumed. As depicted in the flow chart, the first step 41 is to assume a likely 
value for the rotation and on this basis derive corr^ponding epipolax lines of the 
two frames. One would derive a number of such lines, typicaUy at least one quar- 
ter of the lines in the frame and preferably about as many lines as were used in 
the frame, the accuracy generally improving the greater the nimibcr, because of 
the reduced senativity to noise this achieves. Then, as a second step 42, there 
. prepared, histograms oE the. pixel intesst^^^r- '^!^^T?rr. tv* .... ... . .. 

sponding epipolar lines of the two frames. Then, as a next step 43, for each of the 
pairs of corresponding epipolar lines, in turn there is separately derived from the 
histograms of such pairs of lines the sum of squared differences. Then, as step 44 
the total of these sums of squared differences for all of the pairs is determined for 
use as a Ggure of merit of the assiuned amount of rotation. The process is then 
repeated to derive a figure of merit for a different assumed amount of rotation. If 
the second figure of merit is smaller than the firsts the process is repeated with a 
still larger assumed amount of rotation. If the second figure of merit was larger 
than the first, the process is repeated with an assumed amount smaller than the 
ori^nal amount. In similar fashion in a gradient-descent search^ the process is 
repeated until one finds the rotation that yields the miTiiTniini or near TniTiimnm 
of the figure of merit. The amount of rotation that yielded such minimmn is es- 
sentially the true amount of the rotation. Once the amount of rotation is known, 
this can be used in known fashion in conjunction with the two frames of the scene 
to construct a quite accurate three dimensional representation of the scene. 



Alternatively, a pyramidal search can be used in which one begins with a coarse 
search to find an approodmate value and to follow it up with finer and finer searches 
centered about the narrowed region delimited by the previous search. 



(18) 



#^^8-3 2093 3 



la order to Qisuxe that the images satis& the constant image bxightness assump- 
tion, the two images can be first normalized by a process of histogram normaliza- 
tion, which is described in an article by IJ'. Cox entitled *'A Maximmn Likelihood 
N-Camera Stereo Algorithm " published in the proceedings of the Int. Conf . CJom- 
puter Vision & Pattern Recognition (1994), pages 733-739, or hist{^ram specifi- 
cation, which is described in an article hy Gonzalez and Wintz entitled "Digital 
Image Processing**' 

It can be appreciated that while FIG. 4 has been described as a flow chart of 
the process practiced hy the invention, it can also serve as a block diagram of 
hardware components of apparatus designed to carry out the steps that are set 
forth. In particular, each of the blocks could be a spedal purpose computer de- 
signed to carry out the operating step prescribed for it. 

As was previously mentioned in the above procedure, there has been assumed 
that any translational motion of the camera in the two views could be ignored as 
having a negligible efifect on determining the rotational motion. In some instances, 
one may begin by assuming that the motion is entirely of one type, for example 

rotational, and proceed in the manner discussed to deiive an approximation of 
such rotational motion. This could then be followed by use of the same general 
approach, using the rotational apprcodmation found as the fixed value of such 
motion, to get an approximation of the translational motion. There are available 
techniques for estimating the translational motion once there is known the rotar 
tional motion. In instances when especially hi^ accuracy is desired, there can now 
be derived a new approximation of the rotational motion, using the last discovered 
approximation of the translational motion to derive an improved approximation 
'of the rotational motion. In this fashion by successive approximations, a very high 
degree of accuracy should be obtainable* 

The construction of a three dimensional representation of an object &om a pair 
of two-dimensional representations of the object is described in Chapter 6, Stereo 
Vision, pps. 165-240 of a book entitled *Tliree-Dimensional Computer Vision" by 
Oliver Faugeras published by the MIT Press, Cambridge Massachusetts (1993). 
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It should be understood that the spedfic embodunents described are illustrative 
of the general principles of the inventioii. In particular it should be appredated 
that there are other appUcations where it is important to know the amount of 
rotation or translation of a camera is involved between different frames of an ob- 
ject or scene. Fbr example, there are navigational applications in which a camera 
mounted in a robot or on a vehicle takes succe&dve frames of a scene as the robot 
or vehicle moves past a scene to determine its position and knowledge of the ro- 
tation or translation of the camera is important to such determination. 



4, Brief Description of Drawings 



ITG. 1 will be helpful in a preliminary discussion of ^polar geometry. 

PIG. 2 illustrates rotational and translational components of the displacement 
in an epipolar line resulting from some camera motion. 

FIGs» 3 A ic 3B illustrate errors in epipolar lines for inaccurate translation and 
inaccurate rotation, respectively. 

PIG. 4 is a flow diagram of the basic procedure used in the invention. 
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L Abstract 



FIG. 4 



A teclmique for compensating for egomotion of the camera used to record a pair of 
two^ensional views of a scene when the pair of images is to be used to provide 
a three dimendonal representation of the scene. The technique involves compar- 
ing histograms of the intensity, levels of pixels of corresponding epipolar lines in 
the pair of images for assumed amounts of egomotion to identify the amoung that 
results in the smaUest total of the sums of squared differences of the histograms. 



2. Eepresentative Drawing 
FIG. 1 
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